As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.
translated by 谷歌翻译
最近,利用BERT预训练以改善文本到语音(TTS)中的音素编码器引起了人们的注意。但是,这些作品将使用基于字符的单元进行预训练以增强TTS音素编码器,这与将音素作为输入的TTS微调不一致。仅以音素作为输入的预训练可以减轻输入不匹配,但由于音素词汇量有限,因此缺乏对丰富表示形式和语义信息进行建模的能力。在本文中,我们提出了混合Phoneme Bert,这是BERT模型的新型变体,该模型使用混合音素和SUP-PHONEME表示来增强学习能力。具体而言,我们将相邻的音素合并为sup-phonemes,并将音素序列和合并后的sup-phoneme序列与模型输入相结合,这可以增强学习丰富的上下文表示的模型能力。实验结果表明,与FastSpeeCh 2基线相比,我们提出的混合词BERT可以显着改善TTS性能,并以0.30 CMOS增益提高了TTS性能。混合词BERT达到3倍推理加速度和与先前TTS预训练的模型PNG Bert相似的语音质量
translated by 谷歌翻译
本文介绍了语音(TTS)系统的Microsoft端到端神经文本:暴风雪挑战2021。这一挑战的目标是从文本中综合自然和高质量的演讲,并在两个观点中接近这一目标:首先是直接模型,并在48 kHz采样率下产生波形,这比以前具有16 kHz或24 kHz采样率的先前系统带来更高的感知质量;第二个是通过系统设计来模拟语音中的变化信息,从而提高了韵律和自然。具体而言,对于48 kHz建模,我们预测声学模型中的16 kHz熔点 - 谱图,并提出称为HIFINET的声码器直接从预测的16kHz MEL谱图中产生48kHz波形,这可以更好地促进培训效率,建模稳定性和语音。质量。我们从显式(扬声器ID,语言ID,音高和持续时间)和隐式(话语级和音素级韵律)视角系统地模拟变化信息:1)对于扬声器和语言ID,我们在培训和推理中使用查找嵌入; 2)对于音高和持续时间,我们在训练中提取来自成对的文本语音数据的值,并使用两个预测器来预测推理中的值; 3)对于话语级和音素级韵律,我们使用两个参考编码器来提取训练中的值,并使用两个单独的预测器来预测推理中的值。此外,我们介绍了一个改进的符合子块,以更好地模拟声学模型中的本地和全局依赖性。对于任务SH1,DelightFultts在MOS测试中获得4.17均匀分数,4.35在SMOS测试中,表明我们所提出的系统的有效性
translated by 谷歌翻译
Visual place recognition (VPR) is usually considered as a specific image retrieval problem. Limited by existing training frameworks, most deep learning-based works cannot extract sufficiently stable global features from RGB images and rely on a time-consuming re-ranking step to exploit spatial structural information for better performance. In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment. Specifically, StructVPR uses segmentation images as a more definitive source of structural knowledge input into a CNN network and applies knowledge distillation to avoid online segmentation and inference of seg-branch in testing. Considering that not all samples contain high-quality and helpful knowledge, and some even hurt the performance of distillation, we partition samples and weigh each sample's distillation loss to enhance the expected knowledge precisely. Finally, StructVPR achieves impressive performance on several benchmarks using only global retrieval and even outperforms many two-stage approaches by a large margin. After adding additional re-ranking, ours achieves state-of-the-art performance while maintaining a low computational cost.
translated by 谷歌翻译
High order structures (cavities and cliques) of the gene network of influenza A virus reveal tight associations among viruses during evolution and are key signals that indicate viral cross-species infection and cause pandemics. As indicators for sensing the dynamic changes of viral genes, these higher order structures have been the focus of attention in the field of virology. However, the size of the viral gene network is usually huge, and searching these structures in the networks introduces unacceptable delay. To mitigate this issue, in this paper, we propose a simple-yet-effective model named HyperSearch based on deep learning to search cavities in a computable complex network for influenza virus genetics. Extensive experiments conducted on a public influenza virus dataset demonstrate the effectiveness of HyperSearch over other advanced deep-learning methods without any elaborated model crafting. Moreover, HyperSearch can finish the search works in minutes while 0-1 programming takes days. Since the proposed method is simple and easy to be transferred to other complex networks, HyperSearch has the potential to facilitate the monitoring of dynamic changes in viral genes and help humans keep up with the pace of virus mutations.
translated by 谷歌翻译
胃肠道内窥镜手术(GES)对仪器的大小和远端灵巧性有很高的要求,因为内窥镜通道狭窄和曲折的人类胃肠道。本文利用镍钛(NITI)电线来开发微型3-DOF(俯仰 - 翻译)柔性平行机器人手腕(FPRW)。此外,我们在手腕的连接界面上组装了一把电刀,然后对其进行了毛细管,以在猪胃中进行内窥镜粘膜下清扫术(ESD)。每个ESD工作流程中的有效性能证明了设计的FPRW具有足够的工作空间,高远端灵量和高定位精度。
translated by 谷歌翻译
上下文偏见是端到端自动语音识别(ASR)系统的一项重要且具有挑战性现有方法主要包括上下文lm偏置,并将偏置编码器添加到端到端的ASR模型中。在这项工作中,我们介绍了一种新颖的方法,通过在端到端ASR系统之上添加上下文拼写校正模型来实现上下文偏见。我们将上下文信息与共享上下文编码器合并到序列到序列拼写校正模型中。我们提出的模型包括两种不同的机制:自动回旋(AR)和非自动回旋(NAR)。我们提出过滤算法来处理大尺寸的上下文列表以及性能平衡机制,以控制模型的偏置程度。我们证明所提出的模型是一种普遍的偏见解决方案,它是对域的不敏感的,可以在不同的情况下采用。实验表明,所提出的方法在ASR系统上的相对单词错误率(WER)降低多达51%,并且优于传统偏见方法。与AR溶液相比,提出的NAR模型可将模型尺寸降低43.2%,并将推断加速2.1倍。
translated by 谷歌翻译
视觉地位识别是自主驾驶导航和移动机器人定位等应用的具有挑战性的任务。分散注意力在复杂的场景中呈现的元素经常导致视觉场所的感知偏差。为了解决这个问题,必须将信息与任务相关区域中的信息集成到图像表示中至关重要。在本文中,我们介绍了一种基于视觉变压器的新型整体地点识别模型,TransVPR。它受益于变形金刚的自我关注操作的理想性能,这可以自然地聚合任务相关的特征。从多个级别的变压器的关注,重点关注不同的感兴趣区域,以产生全球图像表示。另外,由熔融注意掩模过滤的变压器层的输出令牌被认为是密钥贴片描述符,用于执行空间匹配以重新排名通过全局图像特征检索的候选。整个模型允许具有单个目标和图像级监控的端到端培训。 TransVPR在几个现实世界基准上实现最先进的性能,同时保持低计算时间和存储要求。
translated by 谷歌翻译
电子交易平台的引入有效地将传统系统交易的组织从引用驱动的市场转变为秩序驱动的市场。它的便利导致了越来越多的财务数据,然而由于金融时间序列的低信噪比和非公平性,因此很难用于预测未来价格。更简单的分类任务 - 目标是预测未来价格运动的方向 - 通过监督的学习算法需要足够可靠的标签来概括。然而,标签财务数据比其他域更少得多:价格是否因为噪音或信号而上涨?现有的标签方法对改善学习算法的噪声和有限效果具有有限的对策。这项工作从自我监督学习中的交易和成功中的图像分类中获取了灵感。我们调查将计算机视觉技术应用于金融时序系列以减少噪声暴露,因此产生正确的标签。我们将标签生成视为自我监督的学习方法的借口任务,并比较了文献中常用的天真(和嘈杂)标签,该标签与用于相同的下游分类任务的去噪自动化器产生的标签。我们的结果表明,我们的去噪标签可以改善下游学习算法的性能,适用于小型和大型数据集。我们进一步表明,我们获得的信号可用于有效地与二元策略进行交易。我们建议,通过提出的技术,自我监督的学习构成了一种强大的框架,用于产生“更好”的财务标签,这对于研究市场的潜在模式有用。
translated by 谷歌翻译
嵌入式是数据分析任务的基本构建块之一。嵌入式已经是大型语言模型和图像分析的重要工具,它们的使用扩展到许多其他研究域。这些分布式表示的生成通常是数据和计算昂贵的过程;然而,他们创造后的整体分析和调整仍然是一个发展中地区。在本文中,我们首先提出了非常一般的定量测量,以基于可以学习的嵌入数据中的特征的存在。然后,我们设计了一种方法来删除或缓解嵌入中的不期望的特征,同时保留数据的基本结构。我们使用域反对派网络(DAN)来生成非仿射变换,但我们添加约束以确保保留嵌入的基本结构。我们的经验结果表明,该算法在若干数据集中显着优异地优于艺术品无监督算法,包括业内新颖的应用。
translated by 谷歌翻译